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The aim of this paper is to check feasibility of using the maximal-entropy random walk in algo- 
rithms finding communities in complex networks. A number of such algorithms exploit an ordinary 
or a biased random walk for this purpose. Their key part is a (dis)similarity matrix, according to 
which nodes are grouped. This study encompasses the use of the stochastic matrix of a random 
walk, its mean first-passage time matrix, and a matrix of weighted paths count. We briefly indicate 
the connection between those quantities and propose substituting the maximal-entropy random walk 
for the previously chosen models. This unique random walk maximises the entropy of ensembles 
of paths of given length and endpoints, which results in equiprobability of those paths. We com- 
pare performance of the selected algorithms on LFR benchmark graphs. The results show that the 

£Nj ■ change in performance depends very strongly on the particular algorithm, and can lead to slight 

y—\ ' improvements as well as significant deterioration. 

o: 

(N ■ PACS: 89.75.Hc, 05.40.Fb, 02.50.Ga, 89.70.Cf; 
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I. INTRODUCTION 

^: 

Relationships between entities can be represented as a graph structure upon which some process takes place, be 
it information or opinion spread on social networks, including citation and collaboration networks, WWW or the 
Internet, or perhaps a physical process (molecule movement) on physical or biological networks. One of the natural 
question to ask is whether there are groups of entities which are connected stronger to each other than to the rest of 
the network. Due to the sociological legacy, these are called communities, but can comprise of researchers, websites, 
genes or transcription factors alike. 

A plenitude of methods have been devised to find such communities, and a plenitude of definitions have been 
43 . conceived to tell what it is that we really look for. These definitions and methods have been thoroughly reviewed in 
A particular subgroup of the algorithms is based on random walks (RWs), since intuitively a random walker is 
expected to spend a longer time inside well-connected graph regions, and there should be only a slim chance that it 
__, . crosses from one to another. 

The most common choice for such algorithms has been the well-known random walk defined by equal probabilities 
of going from a node to any of its nearest neighbours, which we call the generic random walk (GRW). On the contrary, 
maximal-entropy random walk (MERW) ensures equiprobability of all paths of a given length and endpoints. Although 
for many problems GRW and biased RWs are often more suitable, MERW deserves particular interest: while the former 
maximises entropy locally (entropy of the nearest neighbour selection) , the latter maximises entropy globally (entropy 
of the path selection) Jp,y|. Among its curious behaviours, MERW exhibits localization of its stationary distribution 
on diluted lattices (2-01 an d Cay ley trees [1, Qjh also relaxes extremely fast on these trees 0, 0|, while it does very 
slowly between two identical connected regions Thus, we believe MERW can serve alongside GRW as a null model 
of random processes on networks. 

It is noteworthy that equiprobable paths (as generated by MERW) are the natural choice for an ensemble used in 
Feynman path integrals (e.g. discrete quantum gravity models with curved space-time) Q or in the optimal sampling 
algorithm in the path- integral Monte Carlo methods Entropy maximization is a global principle much like the 
least action principle. It has earlier led to the biological concept of evolutionary entropy [lOj. Interestin gly, the value 
of entropy for a given graph, as defined by MERW, has been found useful for selection of robust networks ■ Finally, 
it has begun to be used in the study of complex networks fl2l - fl6j . 
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II. GENERIC AND MAXIMAL-ENTROPY RANDOM WALKS 



Let us consider a discrete time random walk on a finite connected undirected graph, with its stochastic matrix P 
is constant in time. An element Py > of this matrix encodes the probability that a walker that stands on a node i 
at time t hops to a node j at time t + 1. These matrix elements fulfil the condition Pij — 1 for all i, which means 
that the number of walkers is conserved. An additional assumption allows the walkers to hop only to a neighbouring 
node. This can be formulated as P^ < Aij, where Aij is the corresponding element of the adjacency matrix A of the 
graph: — 1 if i and j are neighbours, and A^ = otherwise. 

For any time t, the probability of a walker staying on a given vertex of the graph is encoded in the vector n(t) = 
(7Ti(£), . . . , 7Tjv(£)) t . The initial distribution of particles is 7r(0), and the distribution after t steps ir(t) T = 7f(0) T P t . A 
quantity of interest is the stationary probability distribution, which we assume exists. Then it is given by a solution 
of 

n T = n T P : (1) 

and may be regarded as the probability distribution after infinite time. 
GRW is realised by the following stochastic matrix: 

Pii = ^ , (2) 

where fcj = ■ A^ denotes the node degree. The factor 1 /ki in the above formula produces uniform probability of 
selecting one of fe, neighbours of the node i. This choice maximises the entropy of neighbour selection and corresponds 
to the standard Einstein-Smoluchowski-Polya random walk. The stationary probability distribution of GRW is given 
by 7T, =ki/ J^j k j- 

The other type of random walk, MERW, is defined by a stochastic matrix that maximises entropy of a set of 
trajectories with a given length and end-points. This is a global principle similar to the least action principle. It leads 
to the following stochastic matrix: 



Aij l/jQj 

Ao tpoi 



p ij = . (3) 



where Ao is the largest eigenvalue of the adjacency matrix A, and is the i-th element of the corresponding 
eigenvector tpo. By virtue of the Frobenius- Perron theorem all elements of this vector are of the same sign, because 
the adjacency matrix A is irreducible. For a stochastic matrix to maximise the entropy of an ensemble of paths the 
choice ((3]) is unique. 

The defining condition of entropy maximization leads to equiprobability of paths. More precisely, let us take a 
sequence of nodes ^aaa^ 

= (ao, ai,...,a T ), which is a path of r steps with the initial node a and the final node a T . 
The probability of visiting this sequence of nodes is 

P{ja a T ) — Pa a 1 Pa 1 a 2 ' ' ' Pa T -\a T 7 (4) 

which results from the Markov property of the random walk. Upon substitution of MERWs stochastic matrix one 
obtains 

P(7ao Q J = ^T^, (5) 

which depends only on the number of steps and the two ending points, but is independent of the intermediate nodes. 
This is what we mean by equal probability of paths of a given length and end-points. Consequently, the probability 
measure on this ensemble of paths is uniform, and its entropy is maximal. 
The stationary state of MERW is given by Shannon-Parry measure [TtJ : 

m = V>oi • (6) 

The last formula forms a connection between MERW and quantum mechanics, since can be understood as the 
wave function of the ground state of the operator —A and ip^ as the probability of finding a particle in this state 
0, El- The two types of random walk, (J2j) and ([3]), are behave identically on fc-regular graphs. In general, however, 
they have completely disparate properties. 
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III. (DIS) SIMILARITY MATRICES FOR COMMUNITY FINDING ALGORITHMS 

Methods of both assessing centrality [l8[ and finding communities [l|| [2(| have widely utilised calculating powers 
of the stochastic matrix. The one by Latapy and Pons (2(| uses the dissimilarity matrix 



where the division by -Kk is supposed to reduce the effect of a vertex's centrality. Originally, P and ft corresponding 
to GRW were chosen. 

Another approach is an explicit use of the mean first-passage times (MFPT) (2ll-l23l| . MFPT matrix M is a useful 
and well-studied quantity characterising RWs. Its construction with the use of the fundamental matrix Z is given in 

HH 

Z= (l-P + e7c T )- 1 (8) 
M = (EZ d -Z)D , (9) 

where 1 is the identity matrix, e = (1, 1, 1) T , E is a matrix of all ones, Z^ is a diagonal matrix with elements 
{Tidjn = Za, and D is a diagonal matrix with elements (D)u = 1/^. The elements encode the average time to 
reach the vertex j from i for the first time (in general Mij ^ Mji). 

The last approach we discuss is a similarity matrix containing the average number of paths between two given nodes 
(which is just A') with weights that depend on the paths' length 



G( M ) = A*. (10) 



For e M = A > Ao the sum is convergent and can be carried out with the use of spectral decomposition of A. 
From the point of view of paths' statistics, G(n) defines the grand-canonical ensemble of paths. An element G/i(/i) 
corresponds to the grand canonical partition function, /i to the chemical potential, and the average path length is 
(t) fi = —(In To avoid conflicting notation, henceforth we use A = whereas the symbol fj, will be exclusively 

reserved for the mixing parameter of benchmark graphs (see Sec II V A]) . 

In the case of MERW and GRW (generally, for any RW for which D _1 / 2 PD 1 / 2 is symmetric) it can be shown that 
these three quantities are intimately related constituting a common framework for a number of centrality measures 

sua. 



IV. COMPARISON OF COMMUNITY FINDING ALGORITHMS 



Each of the above quantities has an analogic centrality measure: r has the stationary state centrality and centralities 
defined by summation of powers of the stochastic matrix, G has the eigenvector centrality and centralities defined 
by path enumeration, and M has a centrality defined by the inverse of its average rows [26j. These are natural 
counterparts to some community finding methods. 

Just as centrality may be defined with the use of the principal eigenvector of the adjacency matrix or the stochastic 
matrix (then the eigenvector is the stationary state) , there is a family of community finding methods analysing the 
rest of the eigenvectors (often it is the spectrum of Laplacian that is analysed) |27H32| . However, having the two 
random walks at hand, we are more interested in methods that utilise their characteristics. Particularly, we try to 
assess what difference it makes, when we switch between those two random walks. 

There are a number of methods using powers of the transition matrix. For instance, [l^ | use the matrix 

P^ T =^P 4 , (11) 

t=i 

where P corresponded to GRW, and T was taken around 2 — 3. The assumption is that two nodes are close to each 
other if the corresponding rows of P-' matrix are similar. One of the proposed similarity functions between two 
vectors is 

sim(x,y) =exp ^2T - ^ \x t - W |J -1. (12) 
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In this formula, if T = 1 the vectors x, y are rows of the stochastic matrix, hence the elements of each of them sum up 
to 1. There are T stochastic matrices summed in hence in general the elements of each vector sum up to T. If 
the two vectors are maximally different, the sum in (IT^|l becomes 2T, and the similarity reaches the lower boundary 
value of 0. 

The algorithm consists in replacing edge weights of the original graph with the elements of the similarity matrix, 
so that external (intercommunity) links get smaller weights, and the internal ones get larger weights. The procedure 
is iterated until the differences between weights become large enough, and the weights below a given threshold can 
be disposed of. What remains is the communities. It is viable to use the transition matrix of MERW only in the first 
iteration step. As illustrated in FigfTJ MERW produces slightly better results, especially for considerable fi. (Details 
of the comparisons are described in Sec lIV Al ) 
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FIG. 1. Comparison of community detection efficiency between MERW (squares; T — 2, 4 iterations) and GRW (circles; T = 3, 
3 iterations) transition matrix used in the first iteration of algorithm [19J on benchmark graphs. Graph size: (a) TV = 200, (b) 
N = 1000. 



Next, Pons and Latapy [20( introduced an algorithm using as a distance matrix between nodes of the graph the 
quantity given in 0. FigJ^l MERW considerably decreases efficiency of the algorithm for small /i; the precise reasons 
for that are not established. 
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FIG. 2. Comparison of community detection efficiency between MERW (squares; summed powers of P , t = 1 — 3) and GRW 
(circles; t = 3) for the algorithm Pons and Latapy Graph size: (a) N = 200, (b) N = 1000. 
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In (|10l). the weights e - ^* produce the resolvent operator of A, but also factorial weights /3* /t\ might be introduced 
[HI, Hj|j yielding the heat kernel. To analyse the resulting matrix one needs to remove the zeroth eigenmode of A, so 
that G is well-defined. The choice = Ao is directly related to MERW. 

The procedure (33l . l34j goes on, producing a matrix with Os and Is in place of negative and positive entries of G. 
The original idea involved finding all maximal cliques (maximal complete subgraphs) of the graph represented by this 
matrix. Since this is computationally strenuous, we use a much simpler approach and carry out hierarchical clustering 
on that matrix. To obtain communities, we take the dendrogram section which maximises modularity [35| . This 
algorithm, however, should be considered as only a very rough approach, just for the sake of preliminary comparison. 
It can be seen in FigJ3] that exponential weights works better for small /x, while factorial weights give a reasonable 
performance for larger values of mixing parameter. 
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FIG. 3. Comparison between Ao (MERW, squares) and t\ (circles) path weights. Graph size: (a) N = 200, (b) N = 1000. 

Lastly, one may look at methods grouping nodes according to their MFPT values. In [U [23j a similarity matrix is 
introduced that computes the total of differences between MFPTs of random walkers incoming to given nodes a and 
b from any initial node 



A ab = 



N 

c^a.b 



(13) 



N-2 

On this basis the authors developed an algorithm called Netwalk. We skip the details of the algorithm and refer the 
reader to the original papers. In this case, the outcome of the comparison between MFPTs of different random walks 
(we also implement a biased random walk used originally by Netwalk), in FigHJ shows that MERW should not be used 
in this algorithm. The original algorithm, however, works well only for very small /i, and in general its performance 
is unexpectedly unreliable even for large network size. 



A. Benchmark graphs 

The algorithms in Sec lIVI are compared with the use of unweighted undirected LFR benchmark graphs introduced 
in (3|| in a manner analogous to the authors' later work (37|- We take 100 benchmark graphs with N — 200, 1000 
nodes; their exponents for the degree distribution and for the community size distribution are respectively t\ = —2 
and T2 = — 1. For N = 200 the parameters are: the average degree of 10, maximum degree of 30, and the minimum 
and maximum community sizes are taken to be 5 and 35. For N = 1000: the average degree of 20, maximum degree 
of 50, and the minimum and maximum community sizes are 20 and 100, respectively. The mixing parameter (i is the 
fraction of links a given node shares with the nodes outside its community. The parameter is approximately equal 
for all nodes in a graph, and its values are set to /i = 0.1 — 0.6. For the upper bound, most of the algorithms start to 
have severe problem with detecting communities. To check how good partition has been found we use the normalised 
mutual information [38| with respect to the partition planted in the benchmark. Let us note that the definition of a 
community here relies on the planted partition model, which means that the performance of algorithms is checked in 
accordance with this particular definition. 
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FIG. 4. Comparison between Netwalk [23fl using MERW (squares), GRW (circles) and biased RW (diamonds)on benchmark 
graphs. Graph size: (a) N = 200, (b) N = 1000. 




V. CONCLUSIONS 



We have briefly introduced the concept of maximal-entropy random walk and reviewed some of its features, while 
in the main body of this paper we compared the performance of several community finding algorithm, in which 
MERW-based (dis)similarity matrices substituted the original ones. 

The results obtained by the most reliable method checked here, made by Latapy and Pons, are comparable for 
GRW and MERW, although we note significant worsening for small networks when using the latter. 

The other methods have not been previously compared on LFR benchmark graphs. The one by Harel and Koren is 
generally not reliable for [i > 0.4. However, its performance is slightly improved by MERW for both small and large 
networks. In contrast, MERW is not suited for Netwalk. Even for GRW, which was used originally, this algorithm 
produces a markedly unsatisfactory results for the medium range of the mixing parameter in comparison to available 
state-of-the-art methods. The method based on factorial path weighting has considerable problems for small (x. 
Surprisingly, switching to exponential weighting, which corresponds to MERW, produces better results than Netwalk. 
In general, it performs reasonably well, even though the algorithm used simple hierarchical clustering as temporary 
means for the sake of comparison. 

Whereas MERW exhibits surprising localisation and relaxation properties on some defective regular graphs, this case 
study shows that on the LFR benchmark graphs, which are locally random, this random walk can offer a performance 
of community finding methods comparable to that of GRW. It remains to be investigated, if the behaviour of MERW 
on other types of graphs, including real-world networks, is more distinctive. Further effort is also needed to determine 
whether development of a dedicated algorithm which makes better use of the information contained in this type of 
random walk is possible. 
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